Systematic Biology — Latest Matching Preprints

1

Phylogenetic inference from an incomplete fossil record

Hohmann, N.; Warnock, R. C. M.; Jarochowska, E.

2026-06-28 paleontology 10.64898/2026.06.24.734220 medRxiv

Top 0.1%

66.8%

Show abstract

Fossil data is crucial to construct phylogenetic time trees, which serve as the basis to test a wide range of evolutionary hypotheses. While the fossil record is known to be incomplete, modern stratigraphy provides predictions of the structure of the fossil record as expressed by gap location and duration. Advances in phylogenetic model development allow us to propagate this information into Bayesian phylogenetic inference in the form of priors on time-variable fossil sampling. However, the impact and role of stratigraphic architectures on time tree inference has so far remained unexplored. We introduce a novel simulation framework that combines realistic stratigraphic forward models with phylogenetic simulations. Using this framework, we examine (1) how stratigraphically plausible model violations of fossil sampling due to gaps affect total-evidence inference under the fossilized birth-death model and (2) if stratigraphic knowledge on gap duration and timing improves inference when incorporated in priors on fossil sampling. We find that total-evidence analysis is robust to stratigraphically plausible distribution of gaps in disparate stratigraphic architectures, with results being instead dominated by the number of morphological characters. Surprisingly, incorporating information on prominent gaps in the stratigraphic record does not improve phylogenetic inference. Our results suggest that phylogenetic inference is robust to model violations introduced by stratigraphic gaps over short timescales, with results being dominated by a priori known data availability constraints such as morphological character matrix size. This research establishes the foundations for joint modeling of phylogenetic and stratigraphic processes and narrows the knowledge gap between paleontology, stratigraphy, and neontology.

2

Guide-tree bias of whole genome alignment can mislead phylogenomic analyses

Tao, Q.; Grünewald, S.

2026-07-09 evolutionary biology 10.64898/2026.07.06.736671 medRxiv

Top 0.1%

26.4%

Show abstract

Whole-genome alignment (WGA) is widely used for genome-scale phylogenetic inference, and most scalable WGA pipelines rely on progressive alignment guided by a pre-specified tree. Among progressive whole-genome aligners, Progressive Cactus is a successful state-of-the-art method. However, analyses of real and simulated avian data indicate that guide-tree choice can influence downstream tree inference; star guide trees do not remove this effect and can exacerbate long-branch attraction artefacts. We have developed a consensus strategy based on the Progressive Cactus framework by generating a small set of alternative guide-tree alignments and retaining only homology relationships consistently recovered across all alignments. In simulation experiments, consensus alignments improve precision, bring inferred site-pattern frequency distributions closer to those of the true alignments, and recover more true splits than single guide-tree alignments. In a real landbird (Telluraves) dataset, we observe a strong bias towards single binary guide trees and long-branch attraction for less resolved trees. While the reconstructed tree still depends on the phylogenetic method and taxa sampling, our consensus alignment has no clear bias. We implemented a hierarchical consensus workflow that only locally resolves uncertainty in the guide tree. Therefore, the computational cost increases only moderately, for example by an estimated 68 percent for a recently published large-scale alignment of more than 300 modern birds (Neoaves) taxa.

3

Beyond infinite sites: Generalized ABBA-BABA statistic for deeper phylogenies

Zhang, C.; Nielsen, R.

2026-07-08 bioinformatics 10.64898/2026.07.06.736715 medRxiv

Top 0.1%

15.0%

Show abstract

The Patterson's D statistic detects gene flow from ABBA-BABA site patterns, but its biallelic site patterns fail under deeper divergences where multiple hits cause false positives. We propose two extensions, D+ and D*. Both incorporate multiallelic site patterns to reduce saturation bias under JC and F84 model. Simulations show that D+ and D* both remain correctly null under all conditions and detect gene flow effectively, with distinct advantages: D+ guarantees non-negativity of the denominator, while D* provides greater robustness when mutation rates vary across genomic regions. The source code and binary files are publicly available at https://github.com/chaoszhang/ASTER.

4

Tiny Subsamples and Upsampling Tame Big Data Evolutionary Analysis in Phylogenomics

Kumar, S.; Tamura, K.; Sharma, S.

2026-06-23 evolutionary biology 10.64898/2026.06.21.733599 medRxiv

Top 0.2%

12.7%

Show abstract

Long runtime, high memory demands, and reliance on high-performance computing increasingly limit the evolutionary analysis of long phylogenomic datasets. We review a scalable framework based on phylogenomic subsampling and upsampling (PSU), in which many small subsamples of sites from a long concatenated sequence alignment are extended by upsampling prior to inference, and the resulting analyses are then aggregated to obtain stable evolutionary estimates. PSU exploits a useful distinction between the computational burden and the inferential power of statistical methods in molecular phylogenetics: computational cost is strongly influenced by the number of distinct site patterns in the concatenated alignment, whereas statistical power depends primarily on the amount of evolutionary information represented by sites and substitutions. By reducing the former while restoring the latter through upsampling, PSU can approximate many full-data analyses at substantially lower computational cost. Evidence from simulated and empirical datasets shows that PSU can accurately estimate bootstrap support values, select optimal substitution models, test evolutionary hypotheses, and infer branch lengths, divergence times, and associated uncertainty measures, while often reducing runtime and memory requirements by orders of magnitude. The same subsampling-upsampling-aggregation principle underlies all of these applications. PSU also provides distributions of inferred clade support across independent subsamples, enabling detection of concordant and conflicting phylogenetic signals that may remain hidden in conventional concatenated phylogenomic analyses. Adaptive procedures for selecting the subsample size, the number of subsamples, and the number of upsampling replicates make the framework practical across diverse datasets. We suggest that PSU is a general strategy for scalable phylogenomic inference across a broad range of statistical methods. By enabling rigorous analyses of genome-scale alignments on standard computing hardware, PSU expands access to computationally intensive evolutionary methods while reducing the environmental and infrastructural costs of big-data phylogenomics.

5

How Robust are Multispecies Coalescent Species Delimitations in Taxonomically Complex Systems? A Genomic Assessment Using Mediterranean Tethya Sponges

van der Sprong, J.; Cardone, F.; Hoehna, S.; Schaetzle, S.; Deister, F.; Erpenbeck, D.; Woerheide, G.; Vargas, S.

2026-07-05 evolutionary biology 10.64898/2026.07.04.735074 medRxiv

Top 0.2%

12.4%

Show abstract

Reliable species delimitation underpins biodiversity assessment but remains difficult for organisms with plastic morphology and few diagnostic characters. Multispecies coalescent (MSC) methods can delimit species from genomic data, yet they are rarely tested in taxonomically complex, marine invertebrate groups where they are arguably most needed. We used the three Mediterranean species of the genus Tethya, a rare, well-characterised system within the otherwise taxonomically difficult phylum Porifera-distinguished by multiple independent morphological and ecological characters-to evaluate how robust MSC-based delimitation is in such groups. Analysing 64 single-copy nuclear loci in BEAST2 and BPP, we compared constrained, hypothesis-testing approaches (BFD*, BFdriver, A10) with freer, heuristic ones (SPEEDEMON, A11), and examined their sensitivity to data type, clock model, priors, and the species-collapse threshold. All methods recovered the three recognised Mediterranean species, but the resolution of within-lineage structure was method-dependent. The hypothesis-testing approaches consistently supported six lineages, robustly across data types and model assumptions, whereas the heuristic approaches proved less stable. Configurations without a priori species hypotheses often failed to converge or were computationally intractable, a problem compounded by the relaxed clock. In SPEEDEMON the outcome changed with the collapse threshold. Because our system lacks an independent reference point to calibrate this threshold, any delimitation based on it is poorly constrained. We conclude that constrained, hypothesis-testing delimitation is the most robust and reproducible MSC approach, yielding a quantitative, model-based hypothesis that can be weighed against other lines of evidence to inform taxonomic decisions. By clarifying how these methods behave and how their outcomes should be interpreted, our study offers a practical guide for researchers working on comparably complex systems.

6

Nuclear phylogenomics clarifies the family-level backbone and gene-tree conflict in Zingiberales

Wang, J.; Zhu, Q.; Chen, C.; Luo, Y.; He, J.

2026-07-01 evolutionary biology 10.64898/2026.06.25.734679 medRxiv

Top 0.2%

11.6%

Show abstract

Zingiberales includes eight morphologically distinctive families, but its family-level backbone has remained unstable, especially around Musaceae, Heliconiaceae, Lowiaceae, and Strelitziaceae. We analysed 1566 low-copy nuclear genes from 52 samples, representing all eight families and Pontederia crassipes as outgroup. Concatenated maximum likelihood and multispecies coalescent analyses recovered the same backbone: ((Zingiberaceae, Costaceae), (Cannaceae, Marantaceae)) is sister to (Musaceae, (Heliconiaceae, (Lowiaceae, Strelitziaceae))). Penalized-likelihood dating placed the sampled crown group in the Late Cretaceous, with several deep family-level divergences occurring on short internodes. Analysis of 1248 rerooted gene trees showed that conflict is concentrated on these deep branches and in several shallow clades. HyDe tests of empirical and simulated matrices, each including 62,475 triples, did not support widespread ancient hybridization among the major family-level lineages after filtering against the simulated null model. The nuclear data recover a stable Zingiberales backbone, and the long-standing instability of several deep nodes is best explained by rapid early divergence and extensive incomplete lineage sorting.

7

Pygopods are an exceptional radiation of snake-like geckos

Brennan, I. G.; Keogh, J. S.; Esquere, D.

2026-06-26 evolutionary biology 10.64898/2026.06.22.733657 medRxiv

Top 0.2%

7.0%

Show abstract

Limb loss in vertebrate animals is surprisingly common despite imposing strong functional constraints. These pressures funnel species towards regions of limited ecological and phenotypic space. To date, snakes have been considered unique in having escaped this pattern. Using a new species-level phylogeny and comparative morphological and dietary datasets, we show that pygopods, a group of limbless Australo-Papuan geckos, have undergone a similar evolutionary trajectory to snakes. Our analyses provide evidence of exceptional morphological and diet evolution. This is exemplified by strong niche partitioning among genera through dietary specialization and greater than expected dietary disparity. Diversification in pygopods has also been driven by extreme phenotypic evolution, with pygopods encompassing much of the morphological space covered by all other limb-reduced lizards. Interestingly, the diversification of pygopods has resulted in only a modest number of species, emphasizing the decoupling of diversity and richness possible in adaptive radiations.

8

Interspecies Differential Gene Expression Analysis with Regularized Phylogenetic Linear Models

Gallopin, M.; Daunesse, M.; Lespinet, O.; Liehrmann, A.; Bastide, P.

2026-07-03 evolutionary biology 10.64898/2026.06.30.734542 medRxiv

Top 0.2%

6.7%

Show abstract

Comparative transcriptomic datasets are increasingly used to investigate the molecular basis of phenotypic diversification across species. However, finding genes that are differentially expressed (DE) between lineages remains challenging, for two main reasons. First, the random evolutionary drift can blur the signal left by lineage-specific shifts in mean expression, and induces phylogenetic correlations that, if ignored, can widely inflate the False Discovery Rate (FDR), i.e., the amount of spuriously detected genes. Second, DE analysis from RNA-Seq data involves multiple testing on many genes for a small number of individual measurements with high noise, and requires dedicated statistical tools. Traditional DE tools, such as limma, and classical Phylogenetic Comparative Methods (PCMs), such as the Expression Variance and Evolution (EVE) model, are both designed to tackle one of these two challenges alone, but both fail in the context of inter-species RNA-Seq data. In this work, we present phyloDE, a new tool for inter-species DE, that aims at taking the best from both approaches. On simulations based on a recently published four-species rodent dataset, we show that, contrary to other methods, phyloDE correctly controls the FDR in all settings, while keeping a reasonable power. When reanalyzing the empirical dataset, phyloDE discovers more DE genes that exhibit consistent changes in their cis-regulatory landscape compared to EVE in all the experimental settings. The method is implemented in R, with an interface inheriting from limma.

9

CoalMiner: a coalescent model generator for fastsimcoal2

Esplin-Stout, R.; Sethuraman, A.

2026-06-30 evolutionary biology 10.64898/2026.06.25.734618 medRxiv

Top 0.3%

5.5%

Show abstract

Demographic inference using the Site Frequency Spectrum (SFS) is often constrained by the number and complexity of models tested. Here we present a coalescent model generator called CoalMiner for use with fastsimcoal2. CoalMiner utilizes a decision tree framework to generate biologically plausible models, with user input dictating the number and ranges of demographic parameters and histories, which can then be plugged into the fastsimcoal2 pipeline. Using extensive simulations and empirical data, we show that CoalMiner is an effective helper tool to explore demographic model space. CoalMiner is written in Python and is freely available on GitHub: https://github.com/raywray/coalminer with numerous vignettes and tutorials.

10

Towards a Unified Exact Solution of Rearrangement Small Parsimony for Natural Genomes

Bohnenkaemper, L.; Frolova, D.

2026-06-28 bioinformatics 10.64898/2026.06.23.733974 medRxiv

Top 0.3%

5.0%

Show abstract

Phylogenetic reconstruction is a fundamental problem in comparative genomics. As a theoretical problem in rearrangement studies, this has been modelled as the Small Parsimony Problem (SPP), in which ancestral genome structures have to be determined minimizing the number of rearrangement events occurring throughout the phylogeny. This problem is of significant interest in microbial and cancer genomics, due to the prevalence and clinical importance of rearrangement events. Genome structures in this problem are expressed as sequences of markers, which are themselves oriented sequence features (such as genes) that abstract from non-structural variations. Recent research has focused on the problem under the natural genomes model, in which arbitrary variations in copy number of markers are allowed. Natural genomes are often studied under the DCJ-indel model, a model which has already been successfully applied to plasmid data. There also exist ILP solutions to a variant of the Small Parsimony Problem under the DCJ-indel model. However, these solutions are limited in their applicability, as they make some critical simplifications for tractability purposes: ancestral marker frequencies and precomputed putative ancestral adjancencies, with their predicted likelihoods, are assumed as input. This creates multiple problems from both a theoretical and practical perspective. Firstly, this simplification means that not the full state space is searched for a solution, but rather only the subset of genomes with the precomputed putative adjacencies, meaning an optimal solution to the exact SPP is not guaranteed. Secondly, marker frequencies are given externally, without any theoretical guarantees. Thirdly, the method used to precompute adjacencies relies on gene trees, which requires the use of genes as markers, when gene annotation is often unreliable, especially in regions with a lot of rearrangement. Additionally, this restricts the applicability of the approach to sets of genomes that are both divergent and large enough to be able to produce informative gene trees. This is, for example, rarely the case for plasmids, where nucleotide mutations are rarer than rearrangements and genomes are small. Hence, we revisit the problem to solve the exact SPP by introducing a cost to indel operations, which allows us to compute ranges of marker frequencies and derive theoretical results, that allow us to reduce the solution space that the ILP searches without sacrificing optimality. We show that this makes the problem tractable for the case of small and recently related genomes, first on simulated genomes, and then on a set of pathogenic plasmids which represent a realistic use case for the method.

11

Constrained body mass evolution and decoupled morphological rates in plesiosaurs

Zhao, R. J.; Zhang, C.

2026-06-29 paleontology 10.64898/2026.06.24.734298 medRxiv

Top 0.3%

4.8%

Show abstract

Body size, through its links to various physiological traits, has often been hypothesized to influence evolutionary rates. Negative body size-rate correlations have been reported in the morphological or molecular evolution of several extant vertebrate groups, including mammals, birds, reptiles, and teleost fishes. In this study, we estimated body masses for 89 species of plesiosaurs, a clade of Mesozoic aquatic reptiles, and found that their body size evolution conforms to a three-regime Ornstein-Uhlenbeck process, indicative of constrained evolution. Rates of morphological evolution, inferred using the skyline fossilized birth-death process and the variable-rates model, show minimal support for a correlation with body size in this clade. Our results thus serve as a counterexample, suggesting that the negative body size-rate relationship is not a universal vertebrate pattern, but rather a trend restricted to certain lineages.

12

OrthoGLMM: Phylogenetic Association Testing for Gene Content and Trait Evolution

Guhlin, J. G.; Keddell, P.; Dearden, P.

2026-07-06 bioinformatics 10.64898/2026.07.03.736217 medRxiv

Top 0.3%

4.7%

Show abstract

Motivation: Comparative genome projects can now assemble and annotate hundreds of species, creating an opportunity to test whether species-level traits are associated with repeated changes in gene content. These tests must account for shared ancestry, sparse orthogroups, rare trait origins, and thousands of simultaneous associations. Results: We present OrthoGLMM, a phylogenetically informed framework for the association of traits and orthogroup presence/absence or copy number across species. OrthoGLMM combines deterministic GLMM scans with solver-rerun empirical calibration and calibrated FDR estimation. In three benchmark datasets, OrthoGLMM recovered expected signals for bacterial diazotrophy, plant nodulation, and marine mammals. Availability and Implementation: Source code, documentation, example data, and reproducibility scripts will be available at http://github.com/jguhlin/OrthoGLMM.

13

Ancient Rapid Radiation Underlies Persistent Phylogenomic Conflict in Early Collembola Diversification

Cucini, C.; Moody, E. R.; Cicconardi, F.; Montgomery, S. H.

2026-07-09 evolutionary biology 10.64898/2026.07.05.736609 medRxiv

Top 0.3%

4.6%

Show abstract

Collembola (springtails) are among the most abundant and ecologically important soil arthropods, representing one of the oldest extant terrestrial hexapod lineages, with a fossil record extending to the early Devonian. Despite their relevance, phylogenetic relationships among the four extant orders (Entomobryomorpha, Poduromorpha, Symphypleona, and Neelipleona) have remained unresolved for over two decades. Here, we present the most comprehensive phylogenomic analysis of Collembola to date, comprising 1,127 single-copy orthologues from 145 taxa representing 19 families. To improve orthology inference, we developed a novel HMM-based filtering pipeline that significantly reduced hidden paralogy in BUSCO-derived datasets. Across multiple dataset configurations, gene-jackknife replicates, and various maximum-likelihood analyses, we consistently recovered Poduromorpha as the earliest-diverging lineage. Coalescent-based methods instead highlighted discordant arrangements characterised by extremely short internal branches and low quartet support, a pattern consistent with pervasive incomplete lineage sorting and reticulate evolutionary history. We further dissected the phylogenetic signal by exhaustively evaluating all possible inter-order topological arrangements, both on the full concatenated dataset and gene-by-gene, to identify the most phylogenetically informative loci. These analyses rejected the great majority of previously proposed hypotheses, narrowing support to only two statistically indistinguishable topologies (T11 and T4), with the Poduromorpha-first arrangement consistently favoured across both site-homogeneous and site-heterogeneous substitution models. Finally, with molecular dating, we estimated the origin of crown Collembola in the Early Devonian, with the diversification of the extant orders in the Carboniferous. Several extant genera were estimated to be older than many currently recognized families, highlighting the exceptional evolutionary persistence of springtail lineages and suggesting that lineage longevity should be considered when interpreting higher-level taxonomic diversity.

14

Generative continuous time model reveals epistatic signatures in protein evolution

Pagnani, A.; Barrat-Charlaix, P.

2026-07-10 bioinformatics 10.1101/2025.09.17.676821 medRxiv

Top 0.4%

3.9%

Show abstract

Protein evolution is fundamentally shaped by epistasis, where the effect of a mutation depends on the sequence context. As standard phylogenetic methods assume independently evolving sites, there is a need for more complex models based on accurate estimations of the fitness landscape. Good candidates are modern generative models -- such as the Potts model -- which successfully capture epistatic effects. However, recent work on generative evolutionary models usually use discrete time, making them difficult to integrate with the standard frameworks in evolutionary biology. We introduce a continuous-time sequence evolution model using the Gillespie algorithm and parameterized by a generative Potts model. This approach enables us to simulate realistic, family-specific evolutionary trajectories and allows for direct comparison with independent-site models. Surprisingly, we find that while epistasis significantly slows down evolution, it does not change the average evolutionary rates at individual sites. This is explained by the rate heterogeneity caused by context-dependence: we show that the rate at some positions varies between null to high values depending on the context, while other positions are essentially independent from the context. Finally, we show that epistasis leads to a systematic underestimation bias in the inference of evolutionary distance between sequences. Overall, our work provides a new tool for simulating realistic protein evolution and offers novel insights into the complex interplay between epistasis and evolutionary dynamics.

15

Genomic Distortion of Jawed Vertebrate Phylogeny

Brownstein, C.; Yang, L.; Dornburg, A.; Near, T. J.

2026-06-29 evolutionary biology 10.64898/2026.06.28.735080 medRxiv

Top 0.4%

3.1%

Show abstract

Reconstructing patterns of evolution requires understanding the interrelationships of species, yet evolutionary relationships that defy resolution and calibration in time are commonplace across the Tree of Life. Here, we investigate the dynamics of temporal and topological uncertainty by generating a phylogeny of jawed vertebrates using 1105 exonic loci sampled for 540 species spanning all major orders and most families of gnathostomes. Across loci and DNA sequence sites, we observe rapid reductions in statistical support for the monophyly of jawed vertebrate clades that originated around the Cretaceous-Paleogene mass extinction. Phylogenetic signal was scrambled to different degrees during rapid successive divergences in multiple unrelated jawed vertebrate lineages that radiated in this interval, including birds, snakes, placental mammals, and acanthomorph fishes. In addition to showing that particular events have modified phylogenetic signal across the same loci in distantly related vertebrate clades, we also demonstrate how rates of genomic evolution affect our ability to infer the timescale of vertebrate evolution. By testing how the inclusion of lineages of ray-finned fishes with very fast and slow rates of molecular evolution changes inferences of the vertebrate evolutionary timescale, we show that the deepest divergences in ray-finned fishes may be impossible to accurately infer using sequence data and calibrations from a limited fossil record. These results hint at the macroevolutionary realities underlying topological and divergence time uncertainty across evolutionary trees.

16

No silver bullet: Patterns of macrosynteny recapitulate systemic conflicts in the higher-level relationships of the arachnids

Kulkarni, S. S.; Klementz, B. C.; Ballesteros, J. A.; Abshire, K. M.; Cunha, T. J.; Hassan, M. K.; Laumer, E. M.; De Madeiros, B. A. S.; Neu, S. M.; Pankey, S.; Plachetzki, D. C.; Santibanez-Lopez, C. E. C.; Setton, E. V. W.; Varney, R. M.; Abdel-Rahman, M. A.; Hormiga, G.; Sharma, P. P.

2026-06-23 evolutionary biology 10.64898/2026.06.22.733561 medRxiv

Top 0.4%

3.1%

Show abstract

Rare genomic changes have long been sought by phylogeneticists for their potential to resolve obdurate nodes in the tree of life. Recently, patterns of macrosynteny have been proffered as a breakthrough for challenging relationships within invertebrates. One taxon that stands to benefit from the application of this approach is Chelicerata (the sister group to the rest of Arthropoda), whose radiation has long defied resolution, despite intensive investigations using morphological characters, molecular sequence data, and a combination thereof. Challenges to the resolution of chelicerate phylogeny include an ancient rapid radiation, the incidence of several fast-evolving lineages prone to long-branch attraction artifacts, and extinction of multiple ordinal level lineages that cannot be sampled for breaking long branches. At present, only a subset of nodes has been stably resolved. To break this impasse, we brought to bear multiple classes of phylogenetically informative rare genomic changes, including the sequencing of the first genomes for Ricinulei and Palpigradi. Here, we show that an ancient, shared whole genome duplication event is restricted to Arachnopulmonata (the most recent common ancestor of spiders and scorpions), disfavoring traditional placements of either Ricinulei or Palpigradi as close relatives of tetrapulmonates. Intriguingly, investigation of fusion-with-mixing events identified equal support for mutually exclusive placements for Acariformes, the least stable of the arachnid orders. Our results suggest that fusion-with-mixing, far from being a silver bullet, likely exhibits the same emergent property as all character systems, in that it is prone to homoplasy and conflicting signal stemming from ancient rapid radiations.

17

Complex interplay of biomechanics and ecology influenced crab claw morphology evolution

Bicknell, R. D. C.; Wolfe, J. M.; Flynn, J. J.; Klompmaker, A. A.; Chase, M.; Fu, P.; Hopkins, M.

2026-06-23 ecology 10.64898/2026.06.23.733945 medRxiv

Top 0.4%

3.1%

Show abstract

True crabs (Brachyura) are among the most iconic marine arthropods, representing noteworthy examples of morphological and ecological disparity. A striking feature of brachyurans are their anterior pincer-like appendages: chelipeds. These structures showcase a large diversity of morphologies that reflect ecology and overall multifunctionality. Yet, a comprehensive assessment of appendage functional morphology within phylogenetic and ecological trait contexts has never been attempted. By combining 3D geometric morphometrics, finite element analyses, multilocus molecular phylogeny, and ecological trait data for 80 crab species, including three fossil forms, we unveil a complex evolutionary history for crab chelipeds. Despite extreme shape diversity amongst chelipeds, stress distributions are very similar across taxa and hint a many-to-one pattern. High concentrations of chelipeds within constrained morphospace regions associated with peak pinch forces illustrates that brachyuran morphologies optimised for shell crushing may have arisen in the Cretaceous. Deviations from this morphospace highlight the diversification of non-shell-crushing life modes and the influence of sexual selection on appendages. Neither cheliped shape nor pinch force show phylogenetic signal. Together these results indicate that the evolution of cheliped shape is closely associated with, and inferred to have been strongly influenced by, crab ecology, biomechanical needs and sexual selection. SIGNIFICANCE STATEMENTChelipeds, the pincer-like claws of crabs, are among the most morphologically diverse appendages within Arthropoda, yet the evolutionary forces driving this diversity remain poorly understood. By integrating 3D geometric morphometrics, biomechanical modelling, molecular phylogeny, and ecological data across 80 crab species including fossil forms, we demonstrate that cheliped morphology is driven by ecology, biomechanical demands, and sexual selection rather than phylogenetic relatedness. The multifunctionality of these structures produces strong evidence for many-to-one mapping of form to function. Morphologies optimised for durophagy appear to have originated in the Cretaceous, with subsequent diversification into manipulative and sexually selected forms from a morphologically flexible foundation. These findings demonstrate that cheliped diversity reflects a complex interplay between ecological specialisation, biomechanical optimisation, and sexual selection across Brachyura.

18

Phylogenetic Mosaic of an Arms Race with Asymmetrical Sexual Conflict and Its Macroevolutionary Consequences in a Lineage of Small Water Striders

Li, Z.; Chen, H.; Jin, Z.; Freitag, H.; Hecher, C.; Zettel, H.; Fu, S.; Liu, C.; Qiao, M.; Guo, B.; Bu, W.; Ye, Z.

2026-06-30 evolutionary biology 10.64898/2026.06.24.734260 medRxiv

Top 0.4%

3.1%

Show abstract

Sexual conflict has been hypothesized as a driver of speciation, though its effects are likely heterogeneous across phylogenies and between sexes. The semi-aquatic bug, which inhabits water surfaces across diverse aquatic environments, has long served as a model for studying sexual conflict. While previous studies have focused on rapid antagonistic coevolution and the genetic basis of sexually antagonistic traits, the macroevolutionary consequences of asymmetrical sexual conflict--particularly male-dominated grasping traits versus female resistance--remain largely unexplored. Within the subgenus Pseudovelia, males exhibit pronounced phenotypic diversification in grasping structures, whereas females show modest, clade-specific resistance traits, suggesting male-biased asymmetric conflict. This system presents a valuable opportunity to examine how sexual conflict influences diversification and asymmetrical trait evolution across lineages. Using 204 individuals, representing over half of the subgenus's species diversity, we reconstructed a time-calibrated phylogeny, quantified diversification rates, assessed sexual conflict intensity across clades, and analyzed correlations between sexual trait evolution and diversification. Our results reveal extensive phylogenetic conflict, particularly within the East Asian clade, driven by introgression and incomplete lineage sorting (ILS). Furthermore, we observe significant phylogenetic heterogeneity in both phenotypic evolution and diversification rates. Notably, a male "trait package" enhancing grasping ability likely drives rapid diversification in the recently radiated "South China" lineage. In contrast, grasping traits involving abdominal segment VIII are associated with lower conflict intensity, facilitating greater evolutionary flexibility in female resistance and resulting in lineage-specific counter-adaptations. These findings highlight the heterogeneous dynamics of asymmetrical sexual conflict in shaping diversification and speciation.

19

An evaluation of clustering and assembly strategies from Iso-Seq data in the absence of reference genomes in non-model animals

Eleftheriadi, K.; Vazquez-Valls, M.; Fernandez, R.

2026-07-08 evolutionary biology 10.1101/2025.09.18.677004 medRxiv

Top 0.4%

3.0%

Show abstract

Transcriptome assembly enables the recovery of expressed genes and isoforms, but the optimal strategy for reconstructing transcriptomes from long-read sequencing remains unresolved. In particular, establishing best practices for generating accurate gene models and selecting representative isoforms is essential for comparative genomics, as for orthology inference typically only the longest isoform per gene model is included. Here, we systematically compare clustering and de novo assembly methods using PacBio Iso-Seq data from 17 animal lineages spanning seven phyla, most of them non-model species, with the goal of investigating which methodology is more adequate to select one isoform per gene model, in the absence of specific pipelines to do so. We evaluate four approaches: isoseq cluster, CD-HIT, RNA-Bloom2 and isONform. We benchmark them with short-reads using Trinity, assessing assembly quality with BUSCO completeness, short-read mapping rates, coding sequence recovery, and longest isoform prediction. Our results show that CD-HIT clustering at high similarity thresholds ([≥]99%) yields the most complete and coding-rich long-read transcriptomes, rivaling Trinity while avoiding its high redundancy. Consensus-based methods such as isoseq cluster and isONform recover fewer single-copy orthologs (mirrored in a lower BUSCO score) and achieve lower mapping rates, while RNA-Bloom2 provide intermediate performance with reduced duplication. Together, these findings establish, to date, CD-HIT as a robust and practical strategy for transcriptome reconstruction from long-read data when genomic references are unavailable. By benchmarking de novo methods across a taxonomically broad dataset, this work defines the realistic capabilities of long-read transcriptome reconstruction in the absence of a reference genome and provides practical guidance for deriving high-quality gene models and selecting representative isoforms for orthology inference in non-model species.

20

Evidence for the 1/e-law predicting optimal timing of reproduction across taxa

Froese, T.; Froese, R.; Bruss, T.

2026-07-03 evolutionary biology 10.64898/2026.06.30.733937 medRxiv

Top 0.4%

2.3%

Show abstract

Reproductive success requires allocating effort across lifespan in a manner that balances the risk of early mortality against the benefit of higher fecundity or parental expertise that increase with body size or age. Here we report a cross-taxonomic analysis of reproductive schedules in plants, animals, and humans, showing that peak reproductive effort consistently occurs at approximately 1/e (~37%) of species-specific maximum lifespan. The pattern is robust across major phylogenetic groups and independent of absolute lifespan. This convergence is both logically and numerically consistent with the optimal stopping fraction (1/e), which maximizes the probability of selecting a superior option under uncertainty by delaying commitment until 1/e of the available options have been examined. By integrating population dynamics and empirical data with a formal decision-theoretic model, our results suggest a striking previously unrecognized quantitative regularity linking lifespan and reproductive timing. These findings provide a unifying perspective on life-history evolution and suggest that complex biological scheduling strategies are governed by probabilistic principles.